CLAIMS 



1 1. A computer-implemented method for retrieving documents comprising: 

2 inputting the text of one or more documents, wherein each document includes 

3 human readable words; 

4 creating context windows around each said word in each document; 

5 generating a statistical evaluation of the characteristics of all of the windows, 

6 wherein the results are not a function of the order of the appearance of words within 

7 each window; and 

gj combining the results of the statistical evaluation for each window. 



fl 2. The method according to Claim 1 further comprising: 

J determining the likelihood of documents having predetermined characteristics based on the 

•fj combined statistical evaluation for each window. 

p 3. The method according to Claim 2 further comprising: 

2 assigning a document identifier to each document and context window position; and 

3 determining the document identifier of at least one document having said predetermined 

4 characteristics. 

1 4. The method according to Claim 1 further comprising: 

2 defining a plurality of document categories; and 
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3 determining the category of a particular document based on the combined statistical evaluation 

4 for each window. 

1 5. The method according to Claim 1 further comprising: 

2 determining the word that is in the center of a particular window based on the combined 

3 statistical evaluation for each window. 

1 6. The method according to Claim 1 wherein the step of generating a statistical evaluation 

2 further includes counting the occurrences of particular words and particular documents and tabulating 
S3 totals of the counts. 



1 1 7. The method according to Claim 6 wherein the step of generating a statistical evaluation 

2 further includes the step of generating counts about singular word occurrences and about pair- wise 

01 occurrences. 

?t 8. The method according to Claim 7 further comprising the step of pruning the number 

2 of pair-wise counts. 

1 9. The method according to Claim 8 wherein the step of pruning further includes the 

2 steps of monitoring the amount of memory used for the pair-wise counts and pruning when a 

3 predetermined threshold of memory has been exceeded for the pair-wise counts. 
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1 0. The method according to Claim 6 wherein the step of generating a statistical evaluation 
further includes the step of determining probabilities of particular words appearing in particular 
documents based on the counts. 



1 11. The method according to Claim 10 wherein the step of generating a statistical 

2 evaluation further includes determining conditional probabilities of particular words appearing in 

3 particular documents based on the counts. 

1 12. The method according to Claim 11 further comprising the step of calculating a 
23 conditional probability based on a Simple Bayes statistical model. 

JP 13. The method according to Claim 1 wherein the step of creating context windows around 

¥ each word further comprises the step of selecting the words appearing before and after each word 

fi by a predetermined amount in the document and including those selected words in the window. 

if 14. The method according to Claim 13 wherein the word around which each window is 

2 created is not included in the window. 

1 1 5 . The method according to Claim 1 further comprising normalizing the combined results 

2 of the statistical evaluation for the windows. 
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1 16. The method according to Claim 1 wherein the step of evaluating further comprises, 

2 determining a measure of mutual information. 



1 17. The method according to Claim 1 wherein the step of combining includes averaging 

2 probability assessments. 

1 18. A computer system comprising: 

2 storage unit for receiving and storing a plurality of documents, wherein each 

3 document includes human readable words; means for creating context windows around 
4g each said word in each document; 

Iff means for generating a statistical evaluation of the content of each window, 

f[ wherein the order of the appearance of words within each window is not used in the 

V 1 statistical evaluation; 

jf£ means for combining the results of the statistical evaluation for each window; 

® and 

means for determining the probabilities of documents having predetermined 

1 1 characteristics based on the combined statistical evaluation for each window. 

1 19. The computer system according to Claim 18 further comprising: 

2 a document identifier assigned to each document; and 

means for determining the document identifier of at least one document having said 
predetermined characteristics. 
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1 20. The computer system according to Claim 18 further comprising: 

2 a plurality of document categories; and 

3 means for determining the category of a particular document based on the combined statistical 

4 evaluation for each window. 

1 21. The computer system according to Claim 18 further comprising: 

2 means for determining the word that is in the center of a particular window based on the 

3 combined statistical evaluation for each window. 

iff 22. The computer system according to Claim 1 8 wherein the step of generating a statistical 

§] evaluation further includes counting the occurrences of particular words and particular documents and 

¥ 1 tabulating totals of the counts. 

|| 23. The computer system according to Claim 22 wherein the means for generating a 

2i statistical evaluation further includes means for determining probabilities of particular words 

3 appearing in particular documents based on the counts. 

1 24. The computer system according to Claim 23 wherein the means for generating a 

2 statistical evaluation further includes means for determining conditional probabilities of particular 

3 words appearing in particular documents based on the counts. 
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1 25. The computer system according to Claim 18 wherein the means for creating context 

2 windows around each word further comprises means for selecting the words appearing before and 

3 after each word by a predetermined amount in the document and including those selected words in 

4 the window. 

1 26. A computer program product comprising: 

2 a computer program storage device; 

3 computer-readable instructions on the storage device for causing a computer 

4 to undertake method acts to facilitate retrieving documents, the method acts 

5 comprising: 

gl inputting the text of one or more documents, wherein each document includes 

W] human readable words; 

y 1 creating context windows around each said word in each document; 

|p generating a statistical evaluation of the characteristics of each window, 

ltg wherein the results are not a function of the order of the appearance of words within 

lfb each window; and 

12 combining the results of the statistical evaluation for each window. 

1 27. The computer program product according to Claim 26 further comprising: 

2 determining the likelihood of documents having predetermined characteristics based on the 

3 combined statistical evaluation for each window. 
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